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EXISTENCE, CHARACTERIZATION AND APPROXIMATION IN THE 
GENERALIZED MONOTONE-FOLLOWER PROBLEM 

JIEXIAN LI AND GORDAN ZITKOVIC 


Abstract. We revisit the classical monotone-follower problem and consider it in a generalized for¬ 
mulation. Our approach is based on a compactness substitute for nondecreasing processes, the Meyer- 
Zheng weak convergence, and the maximum principle of Pontryagin. It establishes existence under 
weak conditions, produces general approximation results and further elucidates the celebrated connec¬ 
tion between singular stochastic control and stopping. 


1. Introduction 

A direct precursor to the monotone-follower problem dates back to the 1970’s; the basic model orig¬ 
inated from engineering and first appeared in the work of Bather and Chernoff [BC67]. There, it 
was posed in a model of a spaceship being steered towards a target with both precision and fuel con¬ 
sumption appearing in the performance criterion. The authors observed an unexpected connection 
between the control problem they studied and a Brownian optimal stopping problem based on the 
same ingredients; arguing quite incisively, but mostly on heuristic grounds, they demonstrated that 
the value function of the latter is the gradient of the value function of the former. 

In 1984, Karatzas and Shreve [KS84] considered a generalized version of the Bather-Chernoff prob¬ 
lem dubbing it the “monotone follower problem”. In the same paper, using purely probabilistic tools, 
they established rigorously the equivalence of the control and stopping problems under appropriate 
continuity and growth conditions. Some time later, Haussmann and Suo [HS95] applied relaxation 
and compactification methods, used the Meyer-Zheng convergence, and showed existence of the op¬ 
timal control under a different set of conditions. In 2005, Bank [Ban05] constructed a fairly explicit 
control policy under stochastic dynamic fuel constraint in one dimension. Subsequently, Budhiraja 
and Ross [BR06] applied the Meyer-Zheng convergence to prove a general existence theorem, also 
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under a fuel constraint. Guo and Tomecek [GT08] generalized some results of [KS84] in a differ¬ 
ent direction: they established a connection between singular control of finite variation and optimal 
switching. 

Problem formulation. The essence of the monotone follower problem is tracking, as closely as 
possible, a given random process L (the target) by a suitably constrained control process A (the 
follower). In the original setting of [KS84], the target is a Brownian motion, the follower is required 
to be adapted and non-decreasing, and the “closeness” is measured by applying an appropriate func¬ 
tional to the state variable defined as the difference between the position of the target and the position 
of the follower. Our version of this problem is generalized in two directions: 

(a) We allow the dynamics of both the target and the follower to be multidimensional and 
impose weak assumptions on the distribution of dynamics the target L. For our existence and char¬ 
acterization results (Theorems 2.7 and 2.12 below), we only require that L has cadlag paths. For 
the approximation (Theorem A.5 below), we need L to be a Feller process (still allowing, in partic¬ 
ular, inhomogeneities in the cost structure). Also, we consider functionals which are functions of 
the target and the follower, convex in the position of the follower, and not only functions of their 
relative positions. Finally, we relax some of the growth assumptions; in particular, we do not require 
superlinear growth of the cost function to obtain existence of an optimal control (as in, e.g., [Ban05], 
where it serves as a sufficient condition for the existence of a solution to a stochastic representation 
problem which, in turn, characterizes the optimizer.) 

(b) Our formulation is weak (distributional), in the sense that we are only interested in the joint 
distribution of the follower and the target, without fixing the underlying filtered probability space 
and making it a part of the problem. This enables us to prove an approximation result (Theorem A.5 
below) in great generality. On the other hand, as we will see below in Proposition 3.1, every weak 
(distributional) solution can be turned into a strong one under usually met conditions by a simple 
projection operation. Moreover, as far as generality is concerned, any setup where the filtration is 
generated by a finite number of cadlag processes can be easily lifted to our canonical framework, 
allowing us to work with on a canonical (Skorokhod) space right from the start. It is worth noting 
that (even though we do not provide details for such an approach here) even greater generality can 
be achieved by considering Polish-space-valued cMlag processes and their natural filtrations. 

Our results. We treat questions of existence, approximability and characterization (via Pontryagin’s 
maximum principle), as well as connections with optimal stopping. These are tackled using a vari¬ 
ety of methods, including a compactness substitute for monotone processes and the Meyer-Zheng 
convergence. Moreover, we posit the idea that 

the connection between control and stopping can be understood as the connection 

between the monotone-follower problem and its Pontryagin maximum principle. 

The original impetus for our research was twofold: 

(a) On the one hand, we wanted to understand the role played by different regularity and growth 
conditions imposed in the existing literature in order to establish existence of optimal controls. This 
lead to an existence proof (Theorem 2.7 below) under less restrictive conditions on most ingredients. 
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The proof is based on a convenient substitute for compactness under convexity, and not on the Meyer- 
Zheng topology as in some of the papers mentioned above. The beginnings of such an approach can 
be traced back to the fundamental result of Komlos [Sch86], while the version used in the present 
paper is due to Kabanov [Kab99]. 

(b) On other hand - perhaps more importantly - we tried to grasp a more practical issue better, 
namely, the approximation of the archetypically singular monotone-follower problem by a sequence 
of regular, absolutely continuous (even Lipschitz) control problems. To accomplish this task, the 
following conceptual framework was devised. First, a sequence of so-called “capped” problems 
where the exerted controls are constrained to be Lipschitz is posed. These regular problems come 
with increasing upper bounds on the Lipschitz constant and are expected to approach the monotone 
control problem both in value and in optimal controls. Being regular and well-behaved, each capped 
problem is expected to be solvable by the well-known classical methods; the resulting solution 
sequence is, then, expected to converge (in the appropriate sense) towards the solution to the original 
problem. 

The second, larger, part of the paper can be seen as the implementation of the above steps. The 
major difficulty we encountered was the lack of good equicontinuity estimates on the solutions to 
the capped problems. To overcome it we replaced the usual weak convergence under the Skorokhod 
topology with the versatile Meyer-Zheng convergence. Even so, we still needed to close the gap 
between the limit of the values of the capped problems and the value of the original problem. For that, 
we characterized the optimizers (both in the capped and the original problems) via the maximum 
principle of Pontryagin (i.e., the “first-order” condition) and passed to a Meyer-Zheng limit there. 

While ideas described in the previous paragraphs seem to be new, the research relating Pontryagin’s 
maximum principle to singular control problems is certainly not. Indeed, the Pontryagin’s maximum 
principle for singular control problems was first discussed by Cadenillas and Haussmann [CH94] 
already in 1994. With Brownian dynamics, convex cost, and state constraints assumed, these authors 
formulated the stochastic maximum principle in an integral form and gave necessary and sufficient 
conditions for optimality. In order to solve the approximation problem via maximum principle, 
however, one must go beyond their work. Even though the last 20 years have seen an explosion in 
activity in the general theory of BSDE and FBSDE (see e.g., [MPY94], [CM96], [MY99], [MCOl], 
[AM03], [MZl 1]), to the best of our knowledge none of the existing work seems to be able to deal 
directly with the singular FBSDE that the maximum principle for the monotone-follower problems 
yields, even in the Brownian case. Our route, via approximation and simultaneous consideration 
of the related (capped) control problems, can be interpreted as a variational approach to a class of 
singular FBSDE and may, perhaps, be of use in other situations, as well. For example, a combination 
(see Corollary 2.14 below) of our existence and characterization results, i.e.. Theorems 2.7 and 
2.12, guarantees existence of solutions of such FBSDE under weak, monotonicity- and exponential- 
growth-type assumptions on the nonlinearities. 

The approximation result (Theorem 2.21 below) serves as a pleasant justification of singular con¬ 
trols as a conceptual limit of absolutely continuous controls. Moreover, together with the related 
maximum-principle characterization of the optimal controls in the original problem, it leads us to 
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view the celebrated connection between stopping and control in a new light. Indeed, once such a 
characterization is formulated, it is a simple observation that it can be re-interpreted as an optimal 
stopping problem, which turns out to be precisely the optimal stopping problem identified by Bather 
and Chernoff and rigorously studied by Karatzas and Shreve. 

Organization of the paper. After this Introduction, Section 2. contains the formulation of the prob¬ 
lem, a description of the probabilistic setup it is defined on, and main results. Section 3. is devoted 
to proofs. At the end, a short compendium of the most important well-known results - including the 
tightness criteria - on the Meyer-Zheng topology is given in Appendix A. 

2. The Problem and the Main Results 

2.1. Notational conventions and the canonical setup. For N gN, let denote the Skorokhod 
space, i.e., the measurable space of all R^-valued cMlag functions on [0, T], equipped with the cr- 
algebra generated by the coordinate maps. Since the same cr-algebra appears as the Borel cr-algebra 
generated by the Skorokhod topology, as well as by most of the other popular topologies on , we 
call it simply the Borel cr-algebra. The set of all probability measures on the Borel cr-algebra of 
is denoted by The probabilistic notation is used to denote the integration with respect to 
a probability measure in 

The components of the coordinate process X on are generally denoted by X ^,..., X^. Given 
a subset ..., with K < N, of the components of X, we denote by Wx*! x'k the 

projection map —>■ . For P G t^xh x'k induces a probability measure on , 

which we call the ,..., )-marginal of P and denote simply by Fxn x'k ■ 

Often, we group sets of variables into single-named vector-valued components to increase read¬ 
ability. The dimensionality of these components will always be clear from the context, with the 
definition of the marginal extending naturally. To make it easier for the reader, we often employ the 
notation of the form 22'^+^ (L, A) or A) to signal the fact that the first d coordinates are 

collectively denoted by L and the remaining k by A. In the same spirit, we consider (raw) filtra- 
tions of the form }te[o,T]. ..., s < t), t G [0, T], on , with Y 

denoting some (or all) components of X. The notation for their right-continuous enlargements is 
where = r\s>tXY■ Unless explicitly stated otherwise, the usual conditions 
of right-continuity and completeness are not assumed. When the filtration is, indeed, completed, and 
the measure P under which the filtration is completed is clear from the context, we add a bar above 
F (as in F^, e.g.). 

Some of the components of the coordinate process will naturally come with further constraints, 
most often in the form of monotonicity: the subset of denotes the class of (component¬ 
wise) nondecreasing paths A with Aq > 0 (this is natural in our context because we will think of 
all functions as taking the value 0 on (—oo, 0)). If monotonicity is required only for a subset of 
components, the suggestive notation is used. The intended meaning is that only the last 

K 2 components are assumed to be nondecreasing. Similarly, if the monotonicity requirement is 
replaced by that of finite variation, the resulting family is denoted by 22^ (unlike in the case of , 
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no nonnegativity requirement on Aq is imposed for 2?^)- Analogous notation will be used for sets 


of probability measures, as well. 

For A G and a measurable (sufficiently integrable) function / : [0,T] —>■ R, we use the 
appropriately-adjusted version of the Stieltjes integral. Namely, we define 



where the integral on the right is the standard Lebesgue-Stieltjes integral on (0, T], of / with respect 
to A. This corresponds to the interpretation of the process A as having a jump of size Aq just prior to 
time 0. This way, we can incorporate an initial jump in the process A while staying in the standard 
cadlag framework; the price we are comfortable with paying is that the implicit value Aq- = 0 has 
to be fixed. For multidimensional integrators and integrands, the same conventions will be used, 
with the usual interpretation of the multivariate integral as the sum of the component-wise integrals. 

2.2. The monotone-follower problem. Given d, fc G N, we consider the path space A), 

where L plays the role of the target and A the (controlled) monotone follower. As mentioned above, 
the natural, raw, cr-algebras generated by the processes L and A are denoted by 
and = {T^} te[o,T]. respectively. A central object in the problem’s setup is the probability 
measure Pq on which we interpret as the law of the dynamics of the target. No additional 
assumptions are placed on it at this point, but for some of our results to hold, we will need to require 
more structure later. On the other hand, all our results go through if L is assumed to take values 
in a Flausdorff locally-compact topological space with countable base instead of but we keep 
everything Euclidean for simplicity. 

In the spirit of our weak approach, we control the follower by choosing its joint distribution with 
the target L, in a suitably defined admissibility class. In the definition below, the condition P/, = Pq 
ensures that L has the prescribed marginal distribution, while the conditional-independence require¬ 
ment imposes a form of non-anticipativity on the control; 

Definition 2.1 (Admissible controls). A probability P G ^(L, A) is called admissible, denoted 
by P G .4, if 

(1) Pi = Pq, and 

(2) for each f > 0, conditionally on the cr- algebras and are P-independent. 

If, additionally, C Fl"^, for alH G [0,T], up to P-negligible sets, we say that P is strongly 
admissible. 

Remark 2.2. The condition (2) in Definition 2.1 above can be thought of as a non-anticipativity con¬ 
straint where additional, L-independent, randomization is allowed; it is a version of the so-called hy¬ 
pothesis (%} of Bremaud and Yor (see [BY78]). We point out that the choice of the right-continuous 
augmentation F^j^ is crucial for our results to hold (see Example 2.9 below), but also that it reverts 
to the usual hypothesis {%) as soon as a version of the Blumenthal’s 0-1 law holds for L. 


The quality of the tracking job is measured by a nonnegative convex cost functional: 
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Definition 2.3 (Cost functionals). A map C : A) [0, cxa], is called a cost functional if 

there exist measurable functions 

/ : [ 0 , T] [ 0 ,00)'=, g : X [ 0 , 00)'= ^ [ 0 , 00) and x [ 0 ,00)'= [ 0 , cx)), 

such that / is continuous, /i((, •), and g(l^ •) are convex on [ 0 , 00)^, for each I G R'*, and 

C{L,A)= ( f{t)dAt-\- f h{Lt, At) dtg{LT, At)- 
J[o.T] Jo 

Remark 2.4. The role of the process L in the cost functional C above is two-fold. Some of its 
components play the role of the target to be tracked, while the others allow the functions h and g to 
depend on time or on the randomness from the environment. We enforce this interpretation in the 
sequel by making as few assumptions on L as possible, in particular about its relation to A. See 
Remark, 2.22, (2), as well. 

Definition 2.5 (Cost associated with a control). Given a cost functional C and an admissible proba¬ 
bility P G .4, the (expected) cost J (P) of P is given by 

J(P) =E'^[C(L,A)] G [0,oo], 

where L and A denote the components of dimensions d and k, respectively, in 22^^^. 

Definition 2.6 (Value and solution concepts). The value of the monotone-follower problem is given 
by 

V = inf J(P). 

PeA 

A probability measure P G is said to be a weak solution to the monotone-follower problem if 
J(P) < 00 and V = J(P). If such P is strongly admissible, we say that the solution is strong. 
For e > 0, a (weak or strong) e-optimal solution is a (weakly or strongly) admissible P with 
J(P) < V -f e. 

2.3. An existence result. Our first result establishes existence in the monotone-follower problem 
(Definition 2.6) under weak conditions. Here, and in the sequel, j l denotes the Euclidean norm on 
R'^. 

Theorem 2.7 (Existence under linear coercivity). Suppose that the cost function C is linearly coer¬ 
cive, i.e., that there exist constants k, K > 0 such that 

(2.1) ¥F[C{L,A)] > kE"”[|At|], for all P € AwithVF[\AT\] > K. 

Then the monotone-follower problem admits a strong solution whenever its value is finite. 

Remark 2.8. The reader will immediately notice that the linear coercivity condition (2.1) is a fairly 
weak requirement, guaranteed by either strict positivity of /, or uniform (over /) boundedness from 
below of the function p by a strictly increasing linear function in a, for large a. Small modifications 
of our results can be made to deal with the case p = 0, when similar, linear, coercivity is asked of 
h. Similarly, one can relax (2.1) even further by passing to an equivalent probability measure on 
the right-hand side. We leave details to the reader who comes across a situation in which such an 
extension is needed. 
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The following two examples show that neither one of the two major conditions - linear coercivity 
of (2.1) in Theorem 2.7, or the use of the right-continuous augmentation in the definition of 
admissibility (Definition 2.1) - can be significantly relaxed; 


Example 2.9 (Necessity of assumptions). As for the coercivity assumption (2.1), a trivial example 
can be constructed with T = 1, f = 0, h = 0, k = d = 1, g{l, a) = e““ and an arbitrary Pq- The 
value of the problem is clearly 0, but no minimizer exists. Linear coercivity clearly fails, too. 

In order to argue that the right-continuous augmentation F;^ in the Definition 2.1 is necessary, we 
take T = 1 and assume that the dynamics of the target satisfies 

Lt = tLi forf G [0, l],Po-a.s., withPo[Li = 1] = Po[Ti = 0] = 

and that the cost functional is given by 

C(L, A) = f d-t) dAt + f \Lt — Ai;\ dt. 

./[o.i] Jo 

Let P* G ^‘^+’^{L,A) be such that P* = Pq and A; = i^i, for all t G [0,1], P*-a.s. Since the 
admissibility requires that cr(Ao) and (t{Lt) be independent, P* is not admissible. It does have the 
property that 

(2.2) J(P*) < J(P), for each P G with Pl = Pq. 


Indeed, one can check that 


C'(O) 0) < (7(0, a) and C{b, \i) < C{l, a) for all a G D|, 


where 6 and 0 denote the identity and the constant 0 function on [0,1], respectively. Moreover, the 
inequality in (2.2) is an equality if and only if P = P*. Thus, to show that no admissible minimizer 
exists it will be enough to find a sequence {Pn}nGN in A such that J(Pri) \ J(P*). This can be 
achieved easily by using the Po-laws of (A", L), where 


A" 



t < 


n ’ 


1 ], 


2.4. A characterization result. Using the same ingredients as in the formulation of the monotone- 
follower problem, we pose a forward-backward-type stochastic equation (called the Pontryagin 
FBSDE), as a formulation of the maximum principle of Pontryagin. Whenever the Pontryagin FB¬ 
SDE is involved, we automatically assume that both a i—>• h{l, a) and a g{l, a) are continuously 
differentiable in a on [0, oo)^ for each I, and denote their gradients (in a) by V/i and Vp, respectively. 
Any inequalities between multidimensional processes are to be understood componentwise. 

Definition 2.10 (The Pontryagin FBSDE). A probability P G A, Y) is said to be a weak 

solution of the Pontryagin FBSDE if 

(1) Pl,a G a, 

(2) y > 0 and Yt dAt = 0, P-a.s. 

(3) y + /g yh{Lt, At) dt — f is an P)-martingale with Yt = f{T) + S/g{LT, At), P-a.s. 
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Remark 2.11. Under IP as above, (L, Y) can be interpreted as a (weak) solution to a fully-coupled 
stochastic forward-backward differential equation with reflection. Indeed, the forward component 
(L, A) feeds into the backward component Y directly (and through the terminal condition). On the 
other hand, the backward component affects the forward component through the reflection term in 
Definition 2.10, (2). The usual stochastic-representation parameter Z is hidden in our formulation 
(in the martingale property of Y as we do not assume the predictable-representation property in any 
form) and it does not feed directly into the dynamics. For that reason, it would perhaps be more 
appropriate to call (l)-(3) above a forward-backward stochastic equation (FBSE) instead of FBSDE; 
we choose to stick to the canonical nomenclature, nevertheless. 

The main significance of the Pontryagin EBSDE lies in the following characterization: 

Theorem 2.12 (Characterization via the Pontryagin EBSDE). Suppose that the functions g{l, •) and 
h{l, •) are convex and continuously differentiable on [0, oo)^ for each I G 

(1) Suppose that there exist B Orel functions ^ '■ [Q,oo) and a constant M > 0, such 


that 



Jo 

and, for ip G {g,hf 

\Vp{l, a)| < + M(p{l, a), for all {I, a) G x [0, oo). 

Then each solution IP of the monotone follower problem is an (L, A)-marginal of some 


solution P to the Pontryagin FBSDE. 

(2) If the Pontryagin FBSDE admits a solution P, then its marginal is a solution of the 
monotone-follower problem whenever its value is finite 

Remark 2.13. 

(1) Our Pontryagin EBSDE can be interpreted as a weakly-formulated version of (stochastic) 
first-order conditions. These can be found in the literature, in settings similar to ours, and 
in the context of singular control, e.g., in [BROl], [Ban05], or, more recently, [Stel2]). 

(2) The condition in (1) above essentially states that p grows no faster than an exponential 
function, with the parameter uniformly bounded from above in 1. This should be compared 
to virtually no growth condition needed for existence in Theorem 2.7, as well as to the 
polynomial growth conditions needed for the approximation result in Theorem 2.21 below. 

While we will be using the Pontryagin EBSDE mostly as a tool in the proof of Theorem 2.21, we 
believe that the the following result, which is an immediate consequence of Theorems 2.7 and 2.12 
above merits to be mentioned in its own right. 

Corollary 2.14 (Existence for the Pontryagin EBSDE). Under the combined assumptions of Theo¬ 
rems 2.7 and 2.12, part (1), the Pontryagin FBSDE admits a solution, as soon as the value of the 
monotone-follower problem is finite. 
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Remark 2.15. We do not discuss uniqueness of solutions in detail either in the context of Theorem 
2.7 above, or in the context of our other results below. In particular cases, clearly, the strong solution 
will be unique if enough strict convexity is assumed on the problem ingredients. 


2.5. A connection with optimal stoppeng. In our next result, we revisit, and, more importantly, 
reinterpret, the celebrated connection between optimal stopping and stochastic control in the con¬ 
text of the generalized monotone-follower problem in dimension k = 1. Our formulation of the 
optimal-stopping problem differs slightly from the classical one, but is easily seen to be essentially 
equivalent to it (we comment more about it below). It is chosen so as to make our point - namely 
that the stopping problem associated to the monotone-follower problem is but a manifestation of the 
maximum principle of Pontryagin - more prominent. It also follows our distributional philosophy 
and we get to reuse the framework (and the notion) of admissible controls A from Definition 2.1 . 

Specifically, we work on the path space A) and, assuming that the functions g and h are 

continuously-differentiable in a, with derivatives denoted by ga and ha, we define 

rT 


(2.3) 


K{F) = 


{f{TA)+ga{LrA,0)+ f ha{Lt,0)dt 
^ Jta 


'-{ta<oo} 


where ta is the stopping time given by 


ta = inf{f >0 : Aj > 0}, with inf 0 = -foo, 

whenever the expression inside the expectation in (2.3) above is in L^(P); the set of all such P is 
denoted by As- 


Definition 2.16. A probability P € As is said to be a solution of the optimal-stopping problem if 
K{P) < K{F) for all P e As- 


Remark 2.17. Viewed in isolation, the above formulation of the optimal stopping problem contains 
obvious redundancies (the P-behavior of A after ta, for example). Even when the class of the 
probability measures P G A is further restricted so that A becomes a single-jump 0-to-l process, 
P-a.s., our formulation corresponds to a randomized optimal stopping problem, in that A is allowed 
to depend on innovations independent of L. All in all, part (2) of Definition 2.1 makes the problem 
equivalent to a randomized optimal stopping problem with respect to the right-continuous augmenta¬ 
tion of {TY }tG[o,T]- There is no harm, however, since it turns out that, as usual in optimal stopping, 
randomization leads to no increase in value. 


Theorem 2.18 (A connection between control and optimal stopping). Suppose that k = 1 and 
that the assumptions of Theorem 2.12, part (1), hold. Then any solution to the monotone-follower 
problem is also a solution to the optimal-stopping problem. 

Remark 2.19. As we do not use the notion of a value function, there is no analogue of the equation 
(3.17) in Theorem 3.4, p. 862 in [KS84] about equality between the derivative (gradient) of the 
value function in the control problem and the value of the optimal stopping problem. The statements 
about the relationship between the optimal control in the former and the optimal stopping time in 
the later translate directly into our setting. The reader will see that the (short) proof of Theorem 2.18 
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below, given in subsection 3.3, it is nothing but a simple observation, once the Pontryagin principle 
is established. 

2.6. The approximation result. In order to understand the monotone-follower problem better and 
to provide an approach to it with computation in mind, we pose a sequence of its “capped” versions. 
These play the role of natural regular approximands to the inherently singular monotone-follower 
problem. The setting follows closely that of the previous section. The only difference is that the 
set of allowed controls consists only of Lipschitz-continuous nondecreasing processes, without the 
initial jump. More precisely, we have the following dehnition; 

Definition 2.20 (Admissible capped controls). Given n e N, a probability P e is 

called n-capped admissible, denoted by P G if P G .4 and, P-a.s., the coordinate process A 
is Lipschitz continuous with the Lipschitz constant at most n, and Aq = 0, P-a.s. The value of the 
n-th capped problem is given by 

yW = inf J(P), 

and we say that the probability measure P G .At"! is the weak solution to the capped monotone- 
follower problem if yt"l = J(P) < oo. 

While Theorem 2.7 relied on a minimal set of assumptions, the approximation result we give below 
requires more structure. Here, (M'^) denote the set of all inhnitely-differentiable functions on 
with compact support, while refers to the set of all bounded continuous functions; A denotes 

the Lebesgue measure on [0, T], 

Theorem 2.21 (Approximation by regular controls). Suppose that 

(1) The law Pq is Feller, in that for each t G [0, T) 

(a) the a-algebras and on coincide Pg-a.i. 

(b) for each G G there exists G* G C'f,(R.'^) such that 

E^‘>[G(TT)|.Pf+] = Po-«-^- 

(2) The coordinate process L is a quasimartingale under Pg 

(3) The primitives /, g and h are regular enough, in that 

(a) each component of f is uniformly bounded away from 0, 

(b) the functions g{-, a) and h{-, a) are continuous for each a G [0, oo)^. 

(c) h{f •), and g{f •) are continuously differentiable and convex on [0, oo)^ for each I G 

and there exist p, q > 1 andBorel functions <i>g, <i>;i : R'^ —>■ [0, oo) with 

GLP(A(8)Pg) and^giLr) G LP(Pg), 
such that, for ip G {g, h}, we have 

ip(l, 0) -I- |Vy)((, a)| < -I- |o|^ , for all {I, a) G 


Then 
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• For each n, the capped problem admits a solution S ^["1 and 

T/H Y_ 

• A subsequence of the sequence {]p(") }neN converges in the Meyer-Zheng sense to a solution 
P of the monotone follower problem. 

Remark 2.22. 

(1) There are several slightly-different classes of processes found under the name of a Feller 
process in the literature, so we make the essential properties needed in the proof explicit in 
the statement. These particular properties are, furthermore, implied by all the definitions 
of the Feller property the authors have encountered. Consequently, all standard examples 
of Feller processes such as diffusions, stable processes. Levy processes, etc., fall under our 
framework. 

(2) The quasimartingality assumption on L is put in place mostly for convenience. It is known 
that so-called “nice” Feller processes (the domain of whose generator contains smooth func¬ 
tions with compact support) are automatically special semimartingales and, therefore, local 
quasimartingales (see [Schl2] for the first part of the statement, and [Kal02, Theorem 23.20, 
p. 451] for the second). As no convexity in the variable I is assumed, one can further do 
away with the localization in many cases by replacing L by q{L), where g is a smooth, in¬ 
jective and bounded function. Such a replacement would not change the problem; indeed, 
conditions (1) and (3) of Theorem 2.21 are invariant under the transformation L i— q{L). 

(3) The growth assumptions on the functions /, g and h are essentially those of [KS84], rephrased 
in our language. We note the fact that / is bounded away from zero immediately implies 
the linear coercivity condition of Theorem 2.7, while the condition ip{l,0) < ^ip{l), for 
(fi G {g, h}, guarantees that the value is finite. 

Example 2.23. In general, the sequence of capped optimizers cannot be guaranteed to converge 
towards a minimizer P* weakly, under the the Skorokhod topology. Indeed, Skorokhod convergence 
preserves continuity, and all capped optimal controls are continuous, but it is easily seen that the 
solution to the monotone-follower problem does not need to be a continuous process. Indeed, it 
suffices to take k = d = 1, any Pq with Po[Lt > 1] > 0, f = 1, h = 0 and g(l, a) = ^(l — a)^, so 
that the optimal A is given hy At = 0 for t < T and At = max(0, Lt — 1). 

On the other hand, if one can guarantee that the optimizer is continuous (and Aq = 0), the Meyer- 
Zheng convergence automatically upgrades to the weak convergence in (^[O, T] (see [Pra99]). 

One of the immediate consequences of Theorem 2.21 is that the monotone-follower problem can be 
posed over Lipschitz controls, without affecting the value function. 

Corollary 2.24 (Lipschitz e-optimal controls). Under the conditions of Theorem 2.21 for each e > 0 
there exists M > 0 and an e-optimal admissible control P, such that A is uniformly M-Lipschitz, 
¥-a.s. 
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3. Proofs 

Proofs of our main results, namely Theorems 2.7, 2.12, 2.18 and 2.21 are collected in this section. 
The proof of each theorem occupies a section of its own, and all the conditions stated in the theorem 
are assumed to hold - without explicit mention - throughout the section. 

3.1. A proof of Theorem 2.7. We start with an auxiliary result which states that an admissible 
control can always be turned into a strong admissible control without any sacrifice in value. The 
central idea is that, even though the optional projection of a nondecreasing process is not necessarily 
nondecreasing in general, this turns out to be so in our setting. 

Proposition 3.1. For P S A with < oo let °A be the optional projection of A onto the right- 

continuous and complete augmentation F;^ of the natural filtration F^. Then the joint law °P of 
(L, A) is admissible and J(°P) < J(P). 

Proof The optional projection of a cMlag process onto a filtration satisfying the usual conditions is 
indistinguishable from a cMlag process (see, e.g.. Theorem 2.9, p. 18 in [BC09]). It is an immediate 
consequence of the condition (2) of Definition 2.1 that 

=E"”[At|J'|;],a.s.,foralH e [0,T], 

and, so At = ¥F[At\T^] < VF[As\T^] = As, a.s., for s < f. By construction, the cr-algebras 
and differ only in P-negligible sets, and, so, IFt and are conditionally independent given 
which, in turn, implies that the joint law of (L, A) is admissible. 

Next, we show that J{A) < J{A). For p G {g, hj we denote by (p(l^ ■) the convex conjugate (in 
the second variable) of p: 

(p{l, a) = sup I aa — (p{l, a) j so that (p{l, a) = sup I aa — (p{l, a) J. 

Then, for any bounded -measurable random variable ax with (p{Lt, ax) < oo, P-a.s., we have 

E^[ip{Lt,At)\Tjt] > E^[axAt\T^] - f{Lt,ax) = axAt - fi{Lt,ax), P-a.s. 

The P-essential supremum of the right-hand side over all bounded -measurable ax is easily 
seen to be equal to <fi{Lt, At), P-a.s., for t € [0, T], so, by the tower property, EF[Lp{Lt, Tl*)] < 
E^[ip{Lt,At)]. Thus, 

E^[( h{Lt,At)dt + g{Lx,Ax)]<E^[( h{Lt,At)dt + g{Lx,Ax)]. 

Jo Jo 

Finally, we let Ai denote the set of all bounded measurable functions : [0, T] —R with 
(3.1) E[[ mdAt]=E[[ mdAt]. 

J[0,T] J{0,T] 

M. is clearly a monotone class which contains all functions of the form = l(a,T] (t), so, by the 
monotone-class theorem, it contains all bounded measurable functions and, in particular, /. □ 
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Continuing with the proof of Theorem 2.7, we assume that its value is finite, pick a minimizing 
sequence {Pra}neN C A, and use it to build a probability space (SI, T, P) and, on it, the sequence L, 
A(l), A(2),... , as in Lemma A.2. 

Thanks to Proposition 3.1, we may assume, without loss of generality, that all are F:^-adapted, 
where F;^ = denotes the right-continuous and complete augmentation of the natural 

filtration 

Now that a common probability space has been constructed, we follow the methodology of [BROl] 
and [RSll]. Thanks to the linear coercivity condition (2.1), the sequence is bounded 

in L^; also, all are F;(;-adapted, and F:(l is right-continuous. Therefore, we can use Lemma 
3.5, p. 470, in [Kab99] to guarantee the existence of an F:^-adapted process B, with paths in 
and a sequence of Ces^o means of a subsequence of which converges to 

B in the following sense (the sense of optional random measures): for almost all w, the Stieltjes 
measures induced by B^'^\u;) converge weakly towards to the Stieltjes measure induced by B{lo). 
In particular, there exists a countable subset J\f of [0, T) (the set of jumps of f i—>■ on [0, T)) 

such that 

l-T i-T 

/ f{t)dB['^^ / f{t)dBt, a.s., and Bt, a.s., for all f £ [0,T]\A/’. 

Jo Jo 

Therefore, by Patou’s lemma (applied on for the first and the third term, and on the product space 
[0, T] X n for the second), we have 

rT nT 


^[ [ f h[LtjBt) dt + g{LT, Bt)] < 

Jo Jo 

nT 

< liminf E[ / f(t) dB^^^ 

n-)-oo 


+ / hiLt,BDdt + giLT,B^^^)]. 

For a nondecreasing cMlag process A on (n,-F, P) we set J{A) = E[C'(L, A)] and notice that 
the convexity of J and the fact that J(A("i) \ infpg^ together yield that is a 

minimizing sequence, too, in that J{B^'^'>) \ infpg^ Therefore, J{B) < infpg^ and it 
only remains to note that the law of (L, B) is strongly admissible since B is F:(l-adapted. 

3.2. A proof of Theorem 2.12. To streamline the presentation in this and the subsequent sub¬ 
sections, we introduce additional notation: the subgradient map dC{L,A) : [0,T] —>■ R.^, at 
(L, A) e is given by 

dC{L, A)t = fit) + V/i(L„ A,) ds + VgiLr, At) for t G [0, T], 

where, as usual, Vh and Vp denote the gradients with respect to the second variable. The reader 
will easily check that dC{L, A) has the following property (which earns it the name subgradient): 


(3.2) 


CiL, A + A) > C(L, A) + {dCiL, A), A), 


for all A G with A + A € V^, where 


{X,A)= [ XudAu. 
J[0,T] 
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We also note, for future reference and using integration by parts, that 

(3.3) {dC{L,A),A)= f f{t)dAt+ [ Vh{Lt,At)Atdt + Vg{LT,AT)AT, 

JlO,T] Jo 

for all A G 

We start the proof by assuming that P G A) solves the monotone-follower problem, with 

value V = J (P) < oo. In particular, we have C{L,A) G (P). To relieve the notation we work on 
the sample space fl = A), under the probability P, until the end of this part of the proof. 

Moreover, thanks to assumptions of the theorem, for (p G {g, h}, I G a G [0, oo)‘^ and a; G K.'^ 
such that a + X G [0, oo)'^, we have, for each c G (0,1), 


I a + cx)\ < a + cx) 


= ^^{l) + Mip(l,a) + M f {x,V(p{l,a + tx)) dt 

Jo 


< 


+ M(p{l,a)j + M \x\ f a + fx)| (if 


Gronwall’s inequality then implies that 

(3.4) \Vip{l,a + x)\ < + 

Let Va denote the set of all bounded processes A with paths in adapted to the natural filtration 
such that. 


either A G or A = — 4 min(A, n) for some n G N. 

It has the property that for £ G [0,1] and A G Va, the joint law P'^ of (L, A®), where A^ = A + eA, 
is an admissible probability measure in By the optimality of A and (3.2), we have 


E[C{L, A)] < E[C{L, A‘^)] < E[C{L, A) + {dC{L, A'^), eA)], 

from where it follows that 


(3.5) {dC{L, A^), A))- G and E[{dC{L, A"), A)] > 0, for all £ G [0,1]. 

Thanks to boundedness of processes in Va and the fact that C{L,A) is integrable, the inequality 
(3.4) implies that the family 

^{dC{Lt, Af), A) : £ G [0,1]| is uniformly integrable for all A G Va- 
Moreover, both Vh and Vp are continuous, so 

\im{dC(L,A^),A) = {dC{L, A), A), a.s. 

e^O 

It follows that we can pass to the limit as £ —>■ 0 in (3.5) to conclude that 


(3.6) 


E[(,9C'(L, A), A)] > 0, for all A G Va, 


and, consequently, that 

(3.7) E[(r, A)] > 0, for all A G Va, 

where Y denotes the optional projection of dC{L, A) onto the right-continuous and complete aug¬ 
mentation F;^A of F^A Since dC{L, A) is cadlag, the process Y can be chosen in a cMlag version. 
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too (see Theorem 2.9, p. 18 in [BC09]). Hence, by varying A in the class of nondecreasing processes 
in Va, we can conclude that Yt > 0, for all t G [0, T], a.s. 


On the other hand if we use each element of the sequence A„ = — ^ min(A, n) in (3.7), we obtain 


Yt dAt = 0, a.s.. 


7[o.t] 

In order to show that the law P of the triple (L, A, Y) solves the Pontryagin FBSDE, we only need 
to argue that Y + Vh{Lt, At) dt — f is an martingale (under P, on This follows 

directly from the fact that F is a cMlag version of the optional projection of dC{L, A) onto ^ 


Conversely, let P G A, F) be a solution to the Pontryagin FBSDE. To prove that P = 

Pl,a is a weak minimizer in the monotone-follower problem, we pick a competing admissible mea¬ 
sure P' G A. Using Lemma A.l, we construct the measure P = P 0 P' on (with coordi¬ 

nates (L, A, Y, A')). Since f‘(L,A,Y) solves the Pontryagin FBSDE, Y + Vh{Lt, At) dt — f is an 
(fL,A,Y ^ P)-martingale. Moreover, the L-conditional independence between A' and (A, Y) implies 
that it is also an {W^AYA ^ p).martingale. Consequently, we have 

[{dC{L, A), A)] = E^ [(Y, A)] and E^ [{dC{L, A), A)] = E^ [(Y, A)]. 

The subgradient identity (3.2) then implies that 

J(r) = E”* [C(L, A)] > E'^ [C[L, A) + {dC{L, A),A - A)] 

(3.8) 

= J(P) -f E'^ [(Y, A - A)\ = J(P) -f E'^ [(Y, A)\ > J(P). 


3.3. A proof of Theorem 2.18. Let P £ .4 be a solution to the monotone-follower problem. By 
Theorem 2.12, part (1), it can be realized as the marginal ^l,a of some solution Vl,a,y of the 
Pontryagin FBSDE. For an admissible measure P' G A, and using Lemma A.l, we can construct 
the measure P = P(8)P' on (with coordinates (L, A, Y, A)) and work on under P for the 
remainder of the proof. As argued in the previous subsection, the process Y + ha{Lt, At) dt — f 
is an , P)-martingale, and, so, 

® A)tj^, 1{t^,<oo}] 

where ta' = inf{f >0 : Aj > 0} £ [0, T] U {c»}. By the assumptions of convexity we placed on 
h and g, we have the following inequalities 


ha{Ls, 0) - ha{Ls, As) < 0 and Pa(-Z^T, 0) - gaiLr, At) < 0, 
for all s G [0, T], a.s. Therefore, by the nonnegativity of Y, we have 

iF(r) =E[5C(L,0).^,lw,<oo}] -Y.^,l{x,,<oo}] 


= E 


> E 


f {ha{Ls, 0) - ha{Ls, As)) ds + (^ga{LT, 0) - ga{LT, At)) l{r^,<oo} 
Jtai 

j {ha{Ls, 0) - ha{Ls, As)) ds + {ga{LT, 0) - Pa(-^T, At)) 


= E[5C(L,0)o-Yo] 
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On the other hand, if we repeat the computation above with ta' replaced by ta, all the inequalities 
become equalities, implying that K{¥) < K{F'). Indeed, we clearly have 

ha{Ls,0) = ha{Ls,As), on {s < ta}, 

and 

ga{LT,0) = ga{LT,AT) on {ta = oo}, 

as well as 

1{ta< 00}] = Oj 

where this last equality follows from the fact that dA^ = 0. 

3.4. A proof of Theorem 2.21, We start by posing the capped monotone-follower problems on a 
common fixed probability space (O, P) which hosts a cadlag process L with distribution Pl, and 
consider only right-continuous and complete augmentation F;^ of the natural filtration F^, generated 
by L. Let denote the set of all progressively-measurable fc-dimensional processes with values 
in [0, n]^. For u € all components of the process A = u(t) dt are Lipschitz continuous with 

the Lipschitz constant not exceeding n. Conversely, each adapted process with such Lipschitz paths 
admits a similar representation. This correspondence allows us to pose the n-th capped monotone 
follower problem either over the set of process or over the appropriate admissible set = 
{/o dt : u € UA]}. Their (strong) value functions are then defined by 

(3.9) VA] = inf E[C{L,A)]= inf J{u) where J{u) =E[C{L, Lu)] . 

Each A € AA^ is .pj;-adapted and, therefore, strongly admissible, in the sense of Definition 2.20. 
In particular, VA^ > V, for all n. Also, noting that the polynomial-growth assumption implies 
that E[C{L, A)] < oo, for each bounded A, we have VA] < qq, for all n G N, and, consequently, 
L < oo. 

For readability, we split the remainder of the proof into several subsections. 


3.4.1. Existence in the prelimit. Let L^([0, T]x¥t, Prog) denote the space of all (A(8>P-equivalence 
classes) of F;^-progressively-measurable processes u on [0, T] x P with 


l“llL2([0,T]xa,Prog) — 


fo dt 


-, 1/2 


< OO. 


Proposition 3.2. The infimum in (3.9) is attained at some uA^ g uA 


Proof. We proceed in the standard way, using the so-called “direct method”. Let {uk}kGN C UA^ 
be a minimizing sequence, i.e., J{uk) \ VA], Since UA^ is bounded in L^([0, T] x fl, Prog), the 
Banach-Sachs theorem implies that we can extract a subsequence whose Cesaro sums (still denoted 
by {rtfejfcgN) converge strongly towards some uA^ g L^([0,T] x 17, Prog). Furthermore, given 
that h{A] is closed and convex, we have uA^ g UA\ as well. Thanks to the convexity of J, which 
is inherited from C, {ufejfegN remains a minimizing sequence. Hence, to show that uA^ is the 
minimizer, it will be enough to establish lower semicontinuity of J on UA\ which is, in turn, a direct 
consequence of Fatou’s lemma. □ 
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3.4.2. A version of the Pontryagin FBSDE. Having established the existence in the (strong) capped 
monotone follower problem, for each n G N we pick and fix a minimizer rtl”! as in Proposition 3.2 
and turn to a capped version of the Pontryagin FBSDE. We state it in a very weak form (namely, 
as Proposition 3.3) which will, nevertheless suffice to establish the validity of the full Pontryagin 
FBSDE in the limit. The following notation will be used throughout: 

f f{s)dA^r\ 

Jo Jo Jo 


as well as 


Mf' = E 


Vg{LT,AP) + N^- 




t+ 


, Yr=fit) + Mr-NP, 




pT 

/ {ytYdt 

= -E 

/ F 4 d 4 "’ 

Jo 


Jo 


all taken in their cMlag versions. We note immediately that, thanks to the polynomial-growth condi¬ 
tion, all the integrals above are well dehned, and that fI"! is the optional projection of ^["1) 

onto F^. 

Proposition 3.3. For n G N, we have 

(3.10) nE 
and 

pT 

(3.11) lim E / (y}"-')- dt =0. 

n->oo Jq 

Proof. Given v G Wl"! and £ G [0,1] we set B = J^vt dt and define 

Since C{L, ^["1) G L^, the optimality of itl"! implies that 

0 > E[C'(L, 74^)] - E[C'(L, 74"^)] > eE[{dC{L, A^), A^^"^ - B)], 

We let e \ 0 and use the dominated convergence theorem to conclude that 




pT 


pT 

(3.12) 

E 

/ (F4)+(n4-u,)df 
^0 

< E 

/ -i^t) dt 

JO 


Setting V = nl{Y["i<o} yields 


(3.13) 

E 

1 

s 

_ 

1_ 

< E 

1 

-to 

1 

1 

'e' 

1_ 



^0 


^0 


Since the left-hand side of (3.13) is nonnegative and the right-hand side nonpositive, we conclude 
that both of them vanish, which, in turn, directly implies (3.10). 

To show (3.11) we use the inherited subgradient property of Fi”! and (3.10) to obtain 


0 < E 


C'(L,74["1)1 < E [(7(^,0)] -fE 


f!"! dAl"! 


= E [(^(L, 0)] - nE 


Jo 


□ 
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3.4.3. Relative compactness in the Meyer-Zheng topology. Our next step is to pass to the limit, as 
n ^ c», in the Meyer-Zheng convergence and show that the limiting law satisfies the weak FBSDE 
(2. 1 0). The reader will find a short recapitulation of the pertinent known results on the Meyer-Zheng 
convergence (minimally modified to fit our needs) in subsections A. 3, A.4 and A.5 of Appendix A. 

In the sequel, denotes the sequence of laws of the triplets (L, AI"!, Afl"!) on 'Od.+ 2 k^ 

Proposition 3.4. For each p>\,we have 

(3.14) sup < cx), 

n 

and the sequence is relatively compact in the Meyer-Zheng topology on ^^+2^ 


Proof. Since the distribution of first component L does not depend on n, by Theorem A.5 , it will 
be enough to establish that 

supVar^"[A] < oo and sup Var’^" [M] < oo, 

nSN neN 

where Var*^" denotes the conditional variation (in the quasimartingale sense, as defined in (A.3), 
below). Moreover, given that all AI"! are nondecreasing, and all Afl"! are martingales, relative 
compactness will follow once we show that 

supE[A^^] < oo and supE[|Mj?^|] < oo, 

n n 

for which - thanks to our polynomial-growth assumption - it will suffice to establish (3.14). In order 
to do that, for n G N and r > 0 define so that 


aN;’’ _ 


''ds = A. 


tATW(r)’ 


where T["'l(r) = inf{f G [0,T] : A["^ > r} G [0,T] U {oo}. By the sub-optimality of we 
have 


E 


r dt + r h{L„ dt + g{L„ 

Jo Jo 

[ [ h{Lt,A^^^)dt-\-g{Lt,AP) 

Jo Jo 


so that 


E 


/rATH(T-) 


< E 


f{t)uP dt 


h{Lt,r) - h{Lt,A^f^) dt -f {g{Lt,r) - g{Lt,A\^^)^ 


{Af'>T} 


/TAT["](r) 

Since / is positive and componentwise bounded away from zero (say, by c > 0), and h, g are 
nonnegative and convex in their second argument, we have 

E 





/ f{t)u^^^dt 

JTAT["l(r) 

Al 

(A' 
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as well as, on > r}, 

pT pT pT 

/ h{Lt,r) < / h{Lt, dt + / h{Lt,0)dt and 

^TAr["l(r) ^TAT["l(r) Jo 

9 {Lt,r) < g{Lt, + g{Lt,0) 

It remains to apply Lemma A.3 with X = | | and Y = h(Lt,0) dt + g{Lt, 0), to conclude 

that {}neN is bounded in L^, for each p > 0. □ 

3 A A. The Meyer-Zheng limit and its first properties. Having established the relative compactness 
of the sequence we select one of its limit points P*. By passing to a subsequence, if 

necessary, we may assume that P^^i —>■ P* in the Meyer-Zheng topology. 

Proposition 3.5. P^^ is (weakly) admissible. 

Proof. Since the first components L have the same law under each P("i (namely Pq), it is clear 
that the same remains true in the limit. To establish the requirement (2) of Definition 2.1, we pick 
TO G N, two continuous and bounded functions F : ^ R and H : (R'^)™ —>• R, as 

well as a (7“ (R'^)-function G. Thanks to the admissibility of each p("), for each n G N and all 
f < Si < • • • < Sm < T, we have 

[FG{Lt)\T^^] =E^‘'‘’ [F\F^^] E^'"' [G{Lt)\T^^] , 

where F = F(Asj,..., As„^). Since P^"^ = Pl and thanks to first assumption of Theorem 2.21, 
for all n G N we have 

E^'”' [G(LT)|7^i+] = G*{Lt), P(") -a.s., 
for some G* G Gh(R^). Thus, for 0 < ri < • ■ • < < f, we have 

[FG{Lt)H] = [FG*{Lt)H ], p(") - a.s., 

where FI — H{Lr^,... ,Lr^). Thanks to Theorem A.4, after another passage to a subsequence, 
there exists a full-measure subset T of [0,T], which includes T, such that P”-finite-dimensional 
distributions with indices in T converge towards the P-finite-dimensional distributions. Hence, if 
ri < • • • < Tm, t and si < • • • < Sm belong to such T, we have 

E^* [FG(Lt)H] = E®" [FG*{Lt)H]. 

It follows that, for t G T, we have 

(3.15) E^* [FG(Lt)\XI^] = E®”* [F\F^] E®”* [G(Lt)| 7"/'] , P* - a.s., 

for all F, G. It is a part of our assumptions that a version of the Blumenthal’s 0 — 1-law holds. 
By Proposition 3.5, P^ = Pto it follows that a-algebras Ffy and Fjf_ coincide P*-a.s., for each t. 
Moreover, both sides of the equality in (3.15) above admit right-continuous versions, so it remains 
to use the density of T in [0, T] to conclude that P* is also weakly admissible. □ 

Next, we couple the probability measures {P(”i}„gN and P* on the same probability space. 
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Lemma 3.6. There exists a probability space and on it a sequence of 

-valued random elements, as well as an -valued random element {A, L, M) such that 

(1) the law o/(L*-"\ is and the law of{L, A, M) is P*, and 

(2) For almost all u £ Fl, we have 

iL^\uj), A^p\uj), —>■ {Lt{uj), Ax{uj), Mt{uj)) 

as well as 

i^i \^)) ^ {Lt{oj), At{uj), Mt{uj)) 

in (Lebesgue) measure in t. 


Proof. The first step is use Dudley’s extension (see [Dud68], Theorem 3., p. 1569) of the Sko- 
rokhod’s representation theorem to transform the Meyer-Zheng convergence to an almost-sure con¬ 
vergence in the pseudopath topology. Indeed, the original theorem of Skorokhod cannot be applied 
directly since the canonical space together with the pseudopath topology is not Polish. Next, 

a minimal adjustment of a result of Dellacherie (see Lemma 1., p. 356 in [MZ84]) states that the 
pseudopath topology and the topology of the convergence in the sum A+(5 t of the Lebesgue measure 
A on [0, T] and the Dirac mass St on {T} coincide. □ 


On the probability space of Lemma 3.6, we define the sequences 


7V(") = / Vh{L'f-\A^f 


')dt, 


= / f{t)dA 


(n) 


as well as 


N= f Vh(Lt,At}dt, F = [ f{t)dAu 

Jo Jo 

Using the polynomial-growth assumptions and the -boundedness of {Al^”TneN we see immedi¬ 
ately that 

iV in L^(A (g) P), and Mr- 

To deal with we can use an argument completely analogous to that in the last part of the 

proof of Theorem 2.7 (with K replaced by [0, T] \ T). Indeed, together with the -boundedness of 
for all p > 1, it yields that 

(3.16) Ft. 


3.4.5. A passage to a limit in the Pontryagin FBSDE. We define so that 

y(n) = / + M-iVinLi(A(gP). 


pT pT pT 

E[ / Yf-dt] = limE[ / (f/”V dt] = limE[ / (f/”')" dt] = 0, 
Jo ^ Jo ^ Jo 


Thus, 
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where the last equality follow directly from equation (3.11) of Proposition 3.3. Consequently, by 
right continuity, 

(3.17) y* > OforalH e [0,T]. 

Next, we observe that, by Lemma 3.6 and equation (3.16), we have 
E[CiL,A)] =limE[C(L("\ 2 l("))] = inf 

n n 

Therefore, for each n G N, we have 

0 < E[C(L(") , - E[C{L, A)] =: + /„, 

where 

Tfn =E[C'(l("\a("))] -E[C(l(”),A)] and/„ = E[C{L^^\A)-C{L,A)]. 

By convexity of h and g and integration by parts we have 

- A)] =E 

pT 

-E ' 


- Ft 


where 


^0 

= E[[ y4dA4]-i?n 
^0 

pT 

Rn=E[FT+ / Vh{L‘i"\A\"^)Atdt + Vg{L^^\A^^^)AT]. 

Jo 


/o 

By equation (3.10) of Proposition 3.3, we then have 

pT 


Kn < -nE[ f (y/"4 dt] -Rn< -Rn. 

Jo 


On the other hand, thanks to the growth assumptions, the family {C'(L^"\ A) — C'(L, A)}„gN is 
uniformly integrable. By the continuity of g and h in the (-argument, we have C{L^'^'>,A) —>■ 
C{L, A) a.s., so /„ —?■ 0, as n ^ oo. It follows that liminf i?„ < 0, and, therefore, 

l-T 


(3.18) 


E[Ft 


f ^h{Lt, At)Atdt + VgiLA, At)At] < 0. 

Jo 


Next we investigate the martingale properties of the third component process M, in the spirit of 
the martingale-preservation property of the Meyer-Zheng convergence (see Theorem 11., p. 368 in 
[MZ84] ). On the filtered probability space of the capped problem (i.e., of subsection 3.4), the 
process mn 

is a martingale, and A^"! is adapted with respect to the augmented filtration generated 
by L. Thus, we have 


E 


M4<^f(L4,A4,M4) 


l<i<k 


= E 




r(" 


for each fc G N, a continuous bounded function ip : — 

■ ■ ■ < tk < t. It follows that, with T as in Theorem A.4, that 


and any choice of 0 < fi < (2 < 


E[Mr|4’^’“] = Mt, a.s., fort G T, 


(3.19) 
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and, then, by the right-continuity of the paths of M, that M is an martingale. The inequal¬ 

ity (3.18) implies that after another round of integration by parts - we have 

(3.20) [ Yt dAt < 0, a.s. 

Jo 

It remains to aggregate the above results to conclude that the (law) of the triplet (L, A, F) is a weak 
solution of the Pontryagin FBSDE (Dehnition 2. 10). Part (1) is exactly the content of Proposition 3.5, 
while part (2) follows from (3.17) and (3.20). Finally (3) is simply a restatement of the martingale 
property of the process M, established after (3.19) above. Theorem 2.12, part (2) now allows us to 
conclude that the law of the pair (L, A) is a solution to the monotone follower problem. 

Appendix A. Auxiliary results 

In this appendix we gather several results that are used in the body of the paper. They either admit 
hard-to-locate standard proofs, or are minimal extensions of the known results; we state them here, 
and supply proofs, for completeness sake. 

A. 1. Coupling of weakly admissible controls. We start simple coupling lemma based on a stan¬ 
dard use of regular conditional probabilities. It is used in proofs of Theorem 2.7 and Theorem 2.21 
above. 

Lemma A.l (Coupling). For d,k,l G N, /et P G Q) and P' G R') be such that 

Pi = P)^,. Then, there exists a probability measure P G denoted by P (8>i P' 

such that 

^L,q = ^L,Q, 

(^) ^L,R = 

(3) Q and R are V-conditionally independent, given L. 

Proof. The space Q) is a Borel space, so there exists a regular conditional distribution (r.c.d.) 

p : V‘^{L, Q) X 6(77'=) ^ [0,1], B) = P[Q G B\L = x], 

for Q, given L under P. Similarly, let p' : R') x B{'D^) -G [0,1] denote the P'-r.c.d. of R 

given L' and let p denote the the product kernel p : x 6(17'=+') [0,1], given by 

p{x, B) = {p{x, •) (8> ^{x, - fiB), for x GV‘^ and B G 6(27'=+'). 

We dehne P as the (lonescu-Tulcea-type) product Pi (g) p of the measure Pi and the kernel p, i.e., 
the probability measure given by 

^[C] = f f lc{x,q,r) p{x,dq,dr)FL{dx), 

J {q,r)e'D>’+^ 

for C G 6(27''+'^+'). The reader will readily check that so dehned, P = P (gi P' satishes all three 
conditions in the statement. □ 


An immediate application of Lemma A. 1 is the following 
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Lemma A.2. Let be a sequence in A. Then, there exists a probability space and, on it, 

cadlagprocesses {Lt}t£io,T]’ iG[o,T]> S N, such that the joint law of {L, is P„, for 

each n S N, and are independent, conditionally on L. 

Proof We can think of the required sequence L, . as a stochastic process with values 

in 2?^ (and V‘^ for its first component). Using the information on the joint distributions and the 
requirement of conditional independence from the statement, we can apply Lemma A. 1 repeatedly to 
construct its (consistent) family of finite-dimensional distributions. The target spaces and 2?^ are 
Polish, so the sought-for probability space (U, P) can now be constructed by using Kolmogorov’s 
extension theorem. □ 


A.2. An estimate. 

Lemma A.3. Given p > 1, suppose that X S Lj|_ and Y £ satisfy 
(A.l) E[(X — r)"*"] < E[Fl{x>r}]j forallr>0. 

Then, A £ LP and ||A||p < p\\Y\\j^. 


Proof The conclusion clearly holds forp = 1: it suffices to substitute r = 0 into (A.l). Forp > 1, 
multiplying both sides of (A.l) by (p — l)rP“^ and integrating in r over [0, M], for M > 0, yields 

<.M 


E[y(A AM)P-^] =E 


> E 


pIVI 

Y / (p-l)rP“^l{x>r}rfp 

^0 

nM 

/ {p — {X — r)~^ dr 

Jo 


> -E[(A A M)P]. 


It remains to apply Holder’s inequality to obtain 
1 


-E[(A A M)P] < E[y(A A M)p-^] < |ly|L||A A M\\ 


iip-i 


which, after dividing both sides by 11A A M| and letting M ^ oo, completes the proof. □ 


A.3. The pseudopath topology. The topology Tpp we consider on is a following minimal 
modification of the pseudopath topology introduced in [MZ84] . 

A path X £ can be identified with its pseudopath, i.e., a finite measure on the product [0, T] x 
obtained as a push-forward of the “reinforced” Lebesgue measure Leb + S^t} on [0, T], where 
<5 {t} denotes the Dirac mass at {T}, via the map 

[0, T]^t^ {t, x{t)) £ [0, T] X 


With such an identification, the trace of the topology of weak convergence of measures is induced 
on 2?^; we call it the pseudopath topology and denote by Tpp. It is shown in [MZ84, Lemma 1, 
p. 365] - we modify this result (and all others) minimally to fit our setting - that the pseudopath 
topology is metrizable and that, for a sequence {xn}nGN in 22, we have ^ x £ 22, where ^ 
denotes the convergence in the pseudopath topology, if and only if 

pT pT 


(A.2) 


i{T) ^ x{T) and / b{s,Xn{s)) ds ^ 

Jo Jo 


b{s, x(s)) ds, 
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for all continuous and bounded functions b : [0,r] x —)> K. Finally, we mention a result 

due to Dellacherie (see [MZ84], Lemma 1, p. 356) which simply states that the convergence in the 
pseudopath topology and the convergence in the measure A + coincide. 

A.4. The Meyer-Zheng convergence. Using the pseudopath topology Tpp on , one can dehne 
the Meyer-Zheng topology on as the topology of weak convergence of probability measures 
on the topological space {V^, Tpp). Like the pseudopath topology Tpp on , the Meyer-Zheng 
topology on ip is metrizable, but not necessarily Polish (see p. 372 in [MZ84]); the convergence 
in the Meyer-Zheng topology is denoted by As shown in [MZ84], the Borel cr-algebra gener¬ 
ated by the pseudopath topology Tpp coincides with the canonical cr-algebra on , i.e., the one 
induced by the coordinate maps or, equivalently, by the Skorokhod topology. Moreover, the set of 
all pseudopaths, denoted by rk, under Tpp is Polish. 

We note the following (minimal extension) of a useful consequence of the Meyer-Zheng convergence 
( see [MZ84], Theorem 5., p. 365); 

Theorem A.4 (Meyer and Zheng, 1984). Let {P"}„gN be a sequence of probability measures on 
such that that P" —>• P in the Meyer-Zheng sense. Then there exists a subset T C [0, T] of full 
Lebesgue measure, containing T, such that the -finite-dimensional distributions with indices in 
T of the coordinate process converge to the corresponding finite-dimensional distributions underW, 
perhaps after a passage to a subsequence. 

A.5. A criterion for compactness. One of the reasons the Meyer-Zheng topology proved to be 
quite useful in probability theory and optimal stochastic control is a simple characterization of com¬ 
pactness it affords. Unlike the Skorokhod topology, where compactness needs a stronger form of 
equicontinuity, the subsets of *p^ are Meyer-Zheng-compact as soon as they are suitably bounded. 
The following result is a compilation of two statements in [MZ84], namely Theorem 4., p. 360, 
and Theorem 5., p. 365, minimally adapted to fit our setting. We remind the reader that an adapted 
stochastic process X, defined on a hltered measurable space (fl, X, {-^t}t6[o.T]) is said to be a quasi¬ 
martingale under the probability measure P if Xt £ L^(P), for all t £ [0, T] and Var^[X] < c», 
where 

m 

(A.3) Var'^[X] =sup^E'^ 

i=i 

and the supremum is taken over all partitions 0 = to < • ■ • < = r. w £ N, of [0, T]. 

Theorem A.5 (Meyer and Zheng, 1984). Let {P„}„gN be a sequence of probability measures on 
(equipped with the filtration generated by the coordinate maps) with the property that each 
coordinate process {Xj i = ■. ■, N, is a Fn-quasimartingale for each n £ N and 

sup Var^" [X*] < oo, for alii = 1,..., X. 

nSN 

Then, there exists a subsequence {Pn^lfeGN o/{P„}„gN and P £ such that P^^ ^ P in the 
Meyer-Zheng topology. 


E^[Xt^+E'[|X t|] 
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Remark A. 6 . The condition sup„ Var*^” [AT®] < oo is easy to check if AT® is a P„ martingale, for 
each n € N. Indeed, in that case Var”” [AT®] = E®”" [| AT^ |], with its boundedness being equivalent 
to uniform -boundedness of the process X® under all {P„}„gN- 

Similarly, if X® happens to be a process of finite variation, Var^" [X®] is bounded from above by 
a (constant multiple) of the expected total variation of X®. In particular, if X® is nonnegative and 
nondecreasing under all P®®, the condition we are looking for is exactly the same as in the martingale 
case: sup^E”^" [|^t|] < 
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